29 research outputs found

    Discovering Links for Metadata Enrichment on Computer Science Papers

    Full text link
    At the very beginning of compiling a bibliography, usually only basic information, such as title, authors and publication date of an item are known. In order to gather additional information about a specific item, one typically has to search the library catalog or use a web search engine. This look-up procedure implies a manual effort for every single item of a bibliography. In this technical report we present a proof of concept which utilizes Linked Data technology for the simple enrichment of sparse metadata sets. This is done by discovering owl:sameAs links be- tween an initial set of computer science papers and resources from external data sources like DBLP, ACM and the Semantic Web Conference Corpus. In this report, we demonstrate how the link discovery tool Silk is used to detect additional information and to enrich an initial set of records in the computer science domain. The pros and cons of silk as link discovery tool are summarized in the end.Comment: 22 pages, 4 figures, 7 listings, presented at SWIB1

    TermPicker: Empfehlungen von Vokabulartermen für die Wiederverwendung beim Modellieren von Linked Open Data

    Get PDF
    Reusing terms from Resource Description Framework (RDF) vocabularies when modeling data as Linked Open Data (LOD) is difficult and without additional guidance far from trivial. This work proposes and evaluates TermPicker: a novel approach alleviating this situation by recommending vocabulary terms based on the information how other data providers modeled their data as LOD. TermPicker gathers such information and represents it via so- called schema-level patterns (SLPs), which are used to calculate a ranked list of RDF vocabulary term recommendations. The ranking of the recommendations is based either on the machine learning approach "Learning To Rank" (L2R) or on the data mining approach "Association Rule" mining (AR). TermPicker is evaluated in a two-fold way. First, an automated cross-validation evaluates TermPicker’s prediction based on the Mean Average Precision (MAP) as well as the Mean Reciprocal Rank at the first five positions (MRR@5). Second, a user study examines which of the recommendation methods (L2R vs. AR) aids real users more to reuse RDF vocabulary terms in a practical setting. The participants, i.e., TermPicker’s potential users, are asked to reuse vocabulary terms while modeling three data sets as LOD, but they receive either L2R-based recommendations, AR-based recommendation, or no recommendations. The results of the cross-validation show that using SLPs, TermPicker achieves 35% higher MAP and MRR@5 values compared to using solely the features based on the typical reuse strategies. Both the L2R-based and the AR-based recommendation methods were able to calculate lists of recommendations with MAP = 0.75 and MRR@5 = 0.80. However, the results of the user study show that the majority of the participants favor the AR-based recommendations. The outcome of this work demonstrates that TermPicker alleviates the situation of searching for classes and properties used by other data providers on the LOD cloud for representing similar data

    Living Lab Evaluation for Life and Social Sciences Search Platforms -- LiLAS at CLEF 2021

    Full text link
    Meta-evaluation studies of system performances in controlled offline evaluation campaigns, like TREC and CLEF, show a need for innovation in evaluating IR-systems. The field of academic search is no exception to this. This might be related to the fact that relevance in academic search is multilayered and therefore the aspect of user-centric evaluation is becoming more and more important. The Living Labs for Academic Search (LiLAS) lab aims to strengthen the concept of user-centric living labs for the domain of academic search by allowing participants to evaluate their retrieval approaches in two real-world academic search systems from the life sciences and the social sciences. To this end, we provide participants with metadata on the systems' content as well as candidate lists with the task to rank the most relevant candidate to the top. Using the STELLA-infrastructure, we allow participants to easily integrate their approaches into the real-world systems and provide the possibility to compare different approaches at the same time.Comment: 8 pages. Advances in Information Retrieval - 43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28 - April 1, 202

    Applying Linked Data Technologies in the Social Sciences

    Full text link
    In recent years Linked Open Data (LOD) has matured and gained acceptance across various communities and domains. Large potential of Linked Data technologies is seen for an application in scientific disciplines. In this article, we present use cases and applications for an application of Linked Data in the social sciences. They focus on (a) interlinking domain-specific information, and (b) linking social science data to external LOD sources (e.g. authority data) from other domains. However, several technical and research challenges arise, when applying Linked Data technologies to a scientific domain with its specific data, information needs and use cases. We discuss these challenges and show how they can be addressed. (author's abstract

    ELLIS: Interactive exploration of Linked Data on the level of induced schema patterns

    Get PDF
    We present ELLIS, a demo to browse the Linked Data cloud on the level of induced schema patterns. To this end, we define schema-level patterns of RDF types and properties to identify how entities described by type sets are connected by property sets. We show that schema-level patterns can be aggregated and extracted from large Linked Data sets using efficient algorithms for mining frequent item sets. A subsequent visualisation of such patterns enables users to quickly understand which type of information is modelled on the Linked Data cloud and how this information is interconnected

    Overview of LiLAS 2020 -- Living Labs for Academic Search

    Full text link
    Academic Search is a timeless challenge that the field of Information Retrieval has been dealing with for many years. Even today, the search for academic material is a broad field of research that recently started working on problems like the COVID-19 pandemic. However, test collections and specialized data sets like CORD-19 only allow for system-oriented experiments, while the evaluation of algorithms in real-world environments is only available to researchers from industry. In LiLAS, we open up two academic search platforms to allow participating research to evaluate their systems in a Docker-based research environment. This overview paper describes the motivation, infrastructure, and two systems LIVIVO and GESIS Search that are part of this CLEF lab.Comment: Manuscript version of the CLEF 2020 proceedings pape

    Analyse sozialer Netzwerke mit digitalen Verhaltensdaten

    Get PDF
    Die Nutzung von digitalen Technologien, wie Social-Media-Plattformen, hinterlässt riesige Mengen an Verhaltensspuren, die für die Sozialforschung überaus interessant sind. Andere digitale Technologien, wie Mobiltelefone, ermöglichen die Erhebung von Verhaltensspuren zu Forschungszwecken. Diese digitalen Verhaltensdaten bestehen aus genuin relationalen Beziehungen, die als Netzwerke betrachtet werden können. Diese Art von Daten erfordert eine Verlagerung der Perspektive von Individuen zu Mikroereignissen (z.B. ein Post in den sozialen Medien) als Beobachtungseinheiten und stellt etablierte Techniken, wie die Analyse sozialer Netzwerke, in den Mittelpunkt. Wir argumentieren, dass die Verwendung dieses Ansatzes, die Ermittlung individueller Eigenschaften und Einstellungen sowie die Offenlegung von Mikro-Makro-Verhaltensdynamiken mittels Auswertung von Mustern potenziell nützliche Anwendungen sind. Wir diskutieren methodische Herausforderungen und gelangen zu der Schlussfolgerung, dass soziale Theorie ein Grundpfeiler für die Konsolidierung der Computational Social Science (computergestützte Sozialwissenschaft) ist

    TermPicker: Enabling the Reuse of Vocabulary Terms by Exploiting Data from the Linked Open Data Cloud - An Extended Technical Report

    Get PDF
    Deciding which vocabulary terms to use when modeling data as Linked Open Data (LOD) is far from trivial. Choosing too general vocabulary terms, or terms from vocabularies that are not used by other LOD datasets, is likely to lead to a data representation, which will be harder to understand by humans and to be consumed by Linked data applications. In this technical report, we propose TermPicker: a novel approach for vocabulary reuse by recommending RDF types and properties based on exploiting the information on how other data providers on the LOD cloud use RDF types and properties to describe their data. To this end, we introduce the notion of so-called schema-level patterns (SLPs). They capture how sets of RDF types are connected via sets of properties within some data collection, e.g., within a dataset on the LOD cloud. TermPicker uses such SLPs and generates a ranked list of vocabulary terms for reuse. The lists of recommended terms are ordered by a ranking model which is computed using the machine learning approach Learning To Rank (L2R). TermPicker is evaluated based on the recommendation quality that is measured using the Mean Average Precision (MAP) and the Mean Reciprocal Rank at the first five positions (MRR@5). Our results illustrate an improvement of the recommendation quality by 29% - 36% when using SLPs compared to the beforehand investigated baselines of recommending solely popular vocabulary terms or terms from the same vocabulary. The overall best results are achieved using SLPs in conjunction with the Learning To Rank algorithm Random Forests
    corecore